CHAPTER 8 Getting Your Data into the Computer 107

unknown, refused, or not applicable). The goal is to make sure that for every cate-

gorical variable, a numerical code is entered and the cell is not left blank.

Never try to cram multiple choices into one column! For example, don’t enter 1, 2

into a cell in the CaregiverType column to indicate the patient has a nurse and phy-

sician. If you do, you have to painstakingly split your single multi-valued column

into separate two-state flag columns (described earlier) before you analyze the

data. Why not do it right the first time?

Recording numerical data

For numerical data (meaning interval and ratio data), the main issue is how much

precision to record. Recording a numeric value to as many decimals as you have

available is usually best. For example, if a scale can measure body weight to the

nearest tenth of a kilogram, record it in the database to that degree of precision.

You can always round off to the nearest kilogram later if you want, but you can

never “unround” a number to recover digits you didn’t record. So it’s best to

record values in your data from measurement instruments to the degree of preci-

sion provided.

Along the same lines, don’t group numerical data into intervals when recording it.

If you know the age to the nearest year, don’t record Age in 10-year intervals (such

as 20 to 29, 30 to 39, 40 to 49, and so on). You can always have the computer do

that kind of grouping later, but you can never recover the age in years if all you

record is the decade.

Some statistical programs let you store numbers in different formats. The pro-

gram may refer to these different storage modes using arcane terms for short, long,

or very long integers (whole numbers) or single-precision (short) or double-precision

(long) floating point (fractional) numbers. Each type has its own limits, which may

vary from one program to another or from one kind of computer to another. For

example, a short integer may be able to represent only whole numbers within the

range from 32 768

,

to

32.767, whereas a double-precision floating-point number

could easily handle a number like 1 23456789012345 10250

.

. Excel has no trouble

storing numerical data in any of these formats, so to make these choices, it is best

to study the statistical program you will use to analyze the data. That way, you can

make rules for storing the data in Excel that make it easy for you to analyze the

data once it is imported into the statistical program.

Following are issues to consider with respect to numerical variables in Excel:»

» Don’t put two numbers (such as a blood pressure reading of 135 / 85 mmHg)

into one column of data. Excel won’t complain about it, but it will treat it as